skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "You, Y"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Modeling population dynamics is a fundamental problem with broad scientific applications. Motivated by real-world applications including biosystems with diverse populations, we consider a class of population dynamics modeling with two technical challenges: (i) dynamics to learn for individual particles are heterogeneous and (ii) available data to learn from are not time-series (i.e, each individual’s state trajectory over time) but cross-sectional (i.e, the whole population’s aggregated states without individuals matched over time). To address the challenges, we introduce a novel computational framework dubbed correlational Lagrangian Schrödinger bridge (CLSB) that builds on optimal transport to “bridge" cross-sectional data distributions. In contrast to prior methods regularizing all individuals’ transport “costs” and then applying them to the population homogeneously, CLSB directly regularizes population cost allowing for population heterogeneity and potentially improving model generalizability. Specifically our contributions include (1) a novel population perspective of the transport cost and a new class of population regularizers capturing the temporal variations in multivariate relations, with the tractable formulation derived, (2) three domain-informed instantiations of population regularizers on covariance, and (3) integration of population regularizers into data-driven generative models as constrained optimization and an approximate numerical solution, with further extension to conditional generative models. Empirically, we demonstrate the superiority of CLSB in single-cell sequencing data analyses (including cell differentiation and drug-conditioned cell responses) and opinion depolarization. 
    more » « less
    Free, publicly-accessible full text available December 1, 2025
  2. Free, publicly-accessible full text available December 10, 2025
  3. Generating 3D graphs of symmetry-group equivariance is of intriguing potential in broad applications from machine vision to molecular discovery. Emerging approaches adopt diffusion generative models (DGMs) with proper re-engineering to capture 3D graph distributions. In this paper, we raise an orthogonal and fundamental question of in what (latent) space we should diffuse 3D graphs. ❶ We motivate the study with theoretical analysis showing that the performance bound of 3D graph diffusion can be improved in a latent space versus the original space, provided that the latent space is of (i) low dimensionality yet (ii) high quality (i.e., low reconstruction error) and DGMs have (iii) symmetry preservation as an inductive bias. ❷ Guided by the theoretical guidelines, we propose to perform 3D graph diffusion in a low-dimensional latent space, which is learned through cascaded 2D–3D graph autoencoders for low-error reconstruction and symmetry-group invariance. The overall pipeline is dubbed latent 3D graph diffusion. ❸ Motivated by applications in molecular discovery, we further extend latent 3D graph diffusion to conditional generation given SE(3)-invariant attributes or equivariant 3D objects. ❹ We also demonstrate empirically that out-of-distribution conditional generation can be further improved by regularizing the latent space via graph self-supervised learning. We validate through comprehensive experiments that our method generates 3D molecules of higher validity / drug-likeliness and comparable or better conformations / energetics, while being an order of magnitude faster in training. Codes are released at https://github.com/Shen-Lab/LDM-3DG. 
    more » « less
  4. Transfer learning on graphs drawn from varied distributions (domains) is in great demand across many applications. Emerging methods attempt to learn domain-invariant representations using graph neural networks (GNNs), yet the empirical performances vary and the theoretical foundation is limited. This paper aims at designing theory-grounded algorithms for graph domain adaptation (GDA). (i) As the first attempt, we derive a model-based GDA bound closely related to two GNN spectral properties: spectral smoothness (SS) and maximum frequency response (MFR). This is achieved by cross-pollinating between the OT-based (optimal transport) DA and graph filter theories. (ii) Inspired by the theoretical results, we propose algorithms regularizing spectral properties of SS and MFR to improve GNN transferability. We further extend the GDA theory into the more challenging scenario of conditional shift, where spectral regularization still applies. (iii) More importantly, our analyses of the theory reveal which regularization would improve performance of what transfer learning scenario, (iv) with numerical agreement with extensive real-world experiments: SS and MFR regularizations bring more benefits to the scenarios of node transfer and link transfer, respectively. In a nutshell, our study paves the way toward explicitly constructing and training GNNs that can capture more transferable representations across graph domains. Codes are released at https://github.com/Shen-Lab/GDA-SpecReg. 
    more » « less
  5. Hypothesis Understanding the microscopic driving force of water wetting is challenging and important for design of materials. The relations between structure, dynamics and hydrogen bonds of interfacial water can be investigated using molecular dynamics simulations. Experiments and simulations Contact angles at the alumina (0001) and ( ) surfaces are studied using both classical molecular dynamics simulations and experiments. To test the superhydrophilicity, the free energy cost of removing waters near the interfaces are calculated using the density fluctuations method. The strength of hydrogen bonds is determined by their lifetime and geometry. Findings Both surfaces are superhydrophilic and the (0001) surface is more hydrophilic. Interactions between surfaces and interfacial waters promote a templating effect whereby the latter are aligned in a pattern that follows the underlying lattice of the surfaces. Translational and rotational dynamics of interfacial water molecules are slower than in bulk water. Hydrogen bonds between water and both surfaces are asymmetric, water-to-aluminol ones are stronger than aluminol-to-water ones. Molecular dynamics simulations eliminate the impacts of surface contamination when measuring contact angles and the results reveal the microscopic origin of the macroscopic superhydrophilicity of alumina surfaces: strong water-to-aluminol hydrogen bonds. 
    more » « less
  6. Approaches to in silico prediction of protein structures have been revolutionized by AlphaFold2, while those to predict interfaces between proteins are relatively underdeveloped, owing to the overly complicated yet relatively limited data of protein–protein complexes. In short, proteins are 1D sequences of amino acids folding into 3D structures, and interact to form assemblies to function. We believe that such intricate scenarios are better modeled with additional indicative information that reflects their multi-modality nature and multi-scale functionality. To improve binary prediction of inter-protein residue-residue contacts, we propose to augment input features with multi-modal representations and to synergize the objective with auxiliary predictive tasks. (i) We first progressively add three protein modalities into models: protein sequences, sequences with evolutionary information, and structure-aware intra-protein residue contact maps. We observe that utilizing all data modalities delivers the best prediction precision. Analysis reveals that evolutionary and structural information benefit predictions on the difficult and rigid protein complexes, respectively, assessed by the resemblance to native residue contacts in bound complex structures. (ii) We next introduce three auxiliary tasks via self-supervised pre-training (binary prediction of protein-protein interaction (PPI)) and multi-task learning (prediction of inter-protein residue–residue distances and angles). Although PPI prediction is reported to benefit from predicting intercontacts (as causal interpretations), it is not found vice versa in our study. Similarly, the finer-grained distance and angle predictions did not appear to uniformly improve contact prediction either. This again reflects the high complexity of protein–protein complex data, for which designing and incorporating synergistic auxiliary tasks remains challenging. 
    more » « less
  7. We present the first measurement of cosmic-ray fluxes of Li 6 and Li 7 isotopes in the rigidity range from 1.9 to 25 GV. The measurements are based on 9.7 × 10 5 Li 6 and 1.04 × 10 6 Li 7 nuclei collected by the Alpha Magnetic Spectrometer on the International Space Station from May 2011 to October 2023. We observe that over the entire rigidity range the Li 6 and Li 7 fluxes exhibit nearly identical time variations and, above 4 GV , the time variations of Li 6 , Li 7 , He, Be, B, C, N, and O fluxes are identical. Above 7 GV , we find an identical rigidity dependence of the Li 6 and Li 7 fluxes. This shows that they are both produced by collisions of heavier cosmic-ray nuclei with the interstellar medium and, in particular, excludes the existence of a sizable primary component in the Li 7 flux. Published by the American Physical Society2025 
    more » « less
    Free, publicly-accessible full text available May 1, 2026
  8. We report the properties of precision time structures of cosmic nuclei He, Li, Be, B, C, N, and O fluxes over an 11-year solar cycle from May 2011 to November 2022 in the rigidity range from 1.92 to 60.3 GV. The nuclei fluxes show similar but not identical time variations with amplitudes decreasing with increasing rigidity. In particular, below 3.64 GV the Li, Be, and B fluxes, and below 2.15 GV the C, N, and O fluxes, are significantly less affected by solar modulation than the He flux. We observe that these differences in solar modulation are linearly correlated with the differences in the spectral indices of the cosmic nuclei fluxes. This shows, in a model-independent way, that solar modulation of galactic cosmic nuclei depends on their spectral shape. In addition, solar modulation differences due to nuclei velocity dependence on the mass-to-charge ratio ( A / Z ) are not observed. Published by the American Physical Society2025 
    more » « less
    Free, publicly-accessible full text available February 1, 2026
  9. Precision measurements by the Alpha Magnetic Spectrometer (AMS) on the International Space Station of the deuteron ( D ) flux are presented. The measurements are based on 21 × 10 6 D nuclei in the rigidity range from 1.9 to 21 GV collected from May 2011 to April 2021. We observe that over the entire rigidity range the D flux exhibits nearly identical time variations with the p , He 3 , and He 4 fluxes. Above 4.5 GV, the D / He 4 flux ratio is time independent and its rigidity dependence is well described by a single power law R Δ with Δ D / He 4 = 0.108 ± 0.005 . This is in contrast with the He 3 / He 4 flux ratio for which we find Δ He 3 / He 4 = 0.289 ± 0.003 . Above 13 GV we find a nearly identical rigidity dependence of the D and p fluxes with a D / p flux ratio of 0.027 ± 0.001 . These unexpected observations indicate that cosmic deuterons have a sizable primarylike component. With a method independent of cosmic ray propagation, we obtain the primary component of the D flux equal to 9.4 ± 0.5 % of the He 4 flux and the secondary component of the D flux equal to 58 ± 5 % of the He 3 flux. Published by the American Physical Society2024 
    more » « less